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Abstract — We propose a general framework for solving quan- 
tum state estimation problems using the minimum relative 
entropy criterion. A convex optimization approach allows us to 
decide the feasibility of the problem given the data and, whenever 
necessary, to relax the constraints in order to allow for a physi- 
cally admissible solution. Building on these results, the variational 
analysis can be completed ensuring existence and uniqueness of 
the optimum. The latter can then be computed by standard, 
efficient standard algorithms for convex optimization, without 
resorting to approximate methods or restrictive assumptions on 
its rank. 

Index Terms — Quantum estimation, Kullback-Leibler diver- 
gence, Convex optimization 



I. Introduction 

Quantum devices implementing information processing 
tasks promise potential advantages with respect to their clas- 
sical counterparts in a remarkably wide spectrum of applica- 
tions, ranging from secure communications to simulators of 
large scale physical systems 0, 0, 0. 

In order to exploit quantum features to the advantage of a 
desired task, tremendous challenges are posed to experimental- 
ists and engineers, and many of these have stimulated substan- 
tial theoretically-oriented research. Which particular problem 
is critical depends on the physical system under consideration: 
from optical integrated circuits to solid-state devices, the 
tasks in the device engineering, protection from noise and 
control are manifold 0, 0, 0, 0, 0- However, quantum 
estimation Q problems are ubiquitous in applications, be it in 
testing the output of a quantum algorithm, in reconstructing the 
behavior of a quantum channel or in retrieving information at 
the receiver of a communication system 0, 0, 0, ifTol . ifTTI . 
Ifl2l . In this paper we focus on state estimation problem for 
finite-dimensional quantum system, namely the reconstruction 
of a trace-one, positive semidefinite matrix given from data, 
and in particular on an estimation method that addresses two 
critical problems for most real-world situations. The first arises 
when only a small set of potentially noisy data is available, 
yielding no physically-acceptable solution; the second regards 
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situation in which the system dimension is large and the 
available measurement are insufficient to completely determine 
the state. 

To address this second issue, a typical approach in both 
the classical and the quantum world is to resort to a MAX- 
ENT principle HI, OH, El, COD, 0, ED, where one 
opts for a "maximum ignorance" criterion on the choice of 
parameters that are not uniquely determined by data. The 
MAXENT estimation can indeed be seen as a particular 
case of minimum relative entropy estimation, (QjO, where the 
information-theoretic pseudo-distance of the estimated state 
with respect to some a priori state is minimized subjected 
to a set of constraints representing the available data |[T9l . 
GO), Oil . ll22l . ll23l . This a priori information introduces a 
new ingredient with respect to typical maximum-likelihood 
methods for quantum estimation 0, 0. 

A quantum minimum relative entropy method for state 
estimation has been discussed in ETI . Il24l . where approximate 
solutions to minimum relative entropy problems are provided: 
the estimates are shown to be good approximation of the 
optimal solution when this is close enough to the prior. On the 
other hand, a way towards the computation of the exact opti- 
mal solution is indicated in 0191 : Georgiou has analyzed the 
MAXENT problem for estimating positive definite matrices, 
providing a generic form for the optimal solution, parametric 
in the Lagrange multipliers. He has also observed that the 
results can be extended to the more general minimum relative 
entropy problem. 

We shall here extend Georgiou's approach, proving exis- 
tence, uniqueness and continuity of the solution with respect 
to the measured data, when a generic prior is considered. 
The solution can then be computed by standard numerical 
methods. However, the approach returns a meaningful answer 
only when there is a full-rank admissible solution among the 
states compatible with the data. While this appears to be a 
reasonable assumption as quantum full-rank states are generic, 
this is no longer the case whenever the unknown state is pure 
or near the boundary of the physical state set. In fact, it is 
easy to picture realistic scenarios where the effect of noisy or 
biased measurements might actually force the solutions to be 
on the bounduary, or even outside of the admissible set ll25l . 
0. In the latter case the constrained optimization problem "as 
is" is not feasible, and one has to relax the constraints. 

Our strategy to solve these issues is organized as follows: a 
general setting for posing the feasibility problem and quantum 
minimum relative entropy with data corresponding to linear 
constraints is presented in Section HI] In Section Hill we 
propose a way to reformulate the feasibility analysis as a 
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convex optimization problem. The solution of this ancillary 
problem, for which we provide a numerical approach in 
Appendix [A] also indicates an optimal way to perturb, or 
relax the constraints in order to allow for admissible solutions 
to the estimation problem. We also show how the way in 
which the constraints are relaxed can be tailored to match 
the error distribution or the level of noisy we assume on the 
measurements. Once the feasibility analysis returns a positive 
answer, our approach directly leads to the construction of a 
reduced problem for which there exists a positive definite state 
satisfying the given constraints. In Section [TV] we address the 
corresponding (reduced, if needed) minimum relative entropy 
problem, showing it admits a unique full-rank solution. The 
latter can be computed from the closed-form solution of the 
primal problem, and a standard numerical algorithm to find 
the corresponding Lagrange multipliers is suggested. Then, the 
solution to the non-reduced, original problem is immediately 
obtained. Some concluding remarks and future directions and 
applications are summarized in Section [V] 

II. Problem Setting 

A. Quantum States and Measurement Data 

Consider a quantum n-level system. Its state is described by 
a density operator, namely by a positive semidefinite unit-trace 
matrix 

p E V n = {p G C" x " \p = p t >0, tr(p) = 1} , (1) 

which plays the role of probability distribution in the classical 
probability framework. Note that a density matrix depends on 
n 2 — 1 real parameters. 

In this work we will be concerned on the problem of recon- 
structing an unknown p from a set of repeated measurement 
data. This is of course an estimation problem in the statistical 
language, while in the physics community it is usually referred 
to as state tomography Q. 

We assume that data are provided in one of the following 
forms: 

1) Outcome frequencies for projective measurements: con- 
sider repeated measurements of a (Hermitian) observable l26l . 
O = J2k °k^k, where {11^} is the associated spectral family 
of orthogonal projections. The spectrum {ok} represents the 
possible outcomes at each measurement, and the frequency 
of the fc-th outcome given a state p can be computed as 
Pk = tr(pHk)- After K measurements of O, we assume we 
are provided with some experimental estimates of pk, i.e. the 
experimental relative frequencies of occurences pk = #(0 = 
Ok)/K, with #(0 = Ok) the number of measurements that 
returned outcome Ok- 

2) Observable averages: consider a set of n Q measured 
observables, represented by Hermitian matrices Oi, where now 
we only have access to the mean values of the outcomes, 
denoted by (Oi) (and with possible outcomes o^fc), that can be 
theoretically computed as (Oi) := tr(pOi) and experimentally 
estimated by (Oi) = Y,k°i,kPk- 



3) Outcome frequencies for general measurements: con- 
sider repeated measurements of a Positive-Operator Valued 
Measure (POVM), that is generalized measurements that can 
be used to describe indirect measurements on a system of 
interest 0. A POVM with M outcomes, say k = I,... M, 
is associated to a set of non-negative operators {Qk}lLi sucn 
that J2k = I, playing the role of resolution of the identity 
for projective measurements. The probability of obtaining the 
fc-th outcome can be computed by qk = tr(pQk), and ex- 
perimentally estimated by qk after K repeated measurements. 
This case in fact includes the first one, and the generalization 
to multiple POVM is straightforward. 

In all these scenarios, data are provided as a set of real val- 
ues representing estimates fi of quantities fi (that can be either 
Pi, (Oi) or qi), each associated to the state through a linear 
relation of the form /j = tr (pZi), where Zi have the role of 
IT, Oi or Qi described above. Clearly f — > fi = tr(Zip) with 
probability one as N — s- oo. This framework is quite general, 
and can be adapted to include any case if the data are given as 
linear constraints. Another significant situation that fits in this 
framework is when reduced states of a multipartite systems 
are available as data 11271 . ll28l . Finally, by the well-known 
Choi-Jaimlokowskii isomorphism |29], the same setting, and 
methods for solution, can be adapted to include estimation of 
quantum channels, or quantum process tomography J7], fl9]- 

From a theoretical viewpoint, p can be in principle recon- 
structed exactly from at least n 2 — 1 averages fi = tr(pZi) 
i = 1 ... n 2 — 1 when Z\ . . . Z n i_\ are observables which 
do not carry redundant information, namely they form a basis 
for the space of traceless Hermitian matrices. In any practical 
application, however, one has to face the following issues: 

1) Accurate estimates fi of f are only obtained by aver- 
aging over a large quantity of trials; Often only a small 
set of trials is available, and/or the data are subject to 
significant errors; 

2) The number of observables required for a unique recon- 
struction of p grows quadratically with respect to the 
dimension of the quantum system, and exponentially in 
the number of subsystems. Typically only a small subset 
of these is available; 

We here analyze the estimation problem when these two 
aspects are taken into account. The first one will lead us 
to consider the feasibility problem, that is, if the problem 
admits a physically admissible solution for the given data. 
Since errors may affect the f, the reconstructed state may 
not be positive semidefinite, or a valid state that satisfies the 
constraints might not even exist. The second issue generically 
leads to a estimation problem where more than one state 
satisfy the constraints, and thus an additional criterion has 
to be introduced to arrive at a unique solution. As we said, 
a typical strategy in this setting is to introduce an entropic 
functional, e.g. relative entropy with respect to some reference 
state representing a priori information. 

B. Statement of the Main Problems 

Consider the setting described above, where we want to 
estimate the state of an n-dimensional quantum systems from 
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the real data {/j}f_ 2 , experimental estimates of the quantities 
fi = tr(Zip), for the Hermitian matrices Z2 . ■ . Z p , with 
p <C n 2 — 1. In addition to these, we introduce an auxiliary 
observable Z± = I and the corresponding estimate fx = 1. 
In this way, we include the linear constraint tr(p) = 1 in the 
constraints associated with the "data". We wish now to solve 
the following problem. 

Problem 1: Given {Zi} and {fi}, i = 1 . . .p, find: 

p € H n , such that p > 0, fi = tr(pZi), i = 1, . . . ,p. 

(2) 

Here, H n denotes the vector space of Hermitian matrices of 
dimension equal to n. Notice that, if we remove the positivity 
constraint p > 0, all other constraints are linear and identify a 
hyperplane in W n . To our aim it is convenient to first address 
a simpler problem: let 

S:={peH n \p>0, fi = tr(pZi)} 

be the set of the density matrices which solve Problem Q] 
Problem 2 (Feasibility): Determine if S is not empty. 
When the problem is feasible, in general S contains more 
than one solution, and in principle any solution in S fits the 
data. We focus on choosing a solution that has minimum 
distance with respect to an a priori state. In the same spirit 
of MAXENT problem, this corresponds to give maximum 
priority to fitting the data, and then choosing the admissible 
solution that is the closest (in the relative-entropy pseudo- 
distance) to our a priori knowledge on the systems. To 
this aim, consider the (Umegaki's) quantum relative entropy 
between p € T> n , and t <G int(2? n ) l30l : 

§Go||t) = tr(plog/9 - plogr). (3) 

Assuming r in the interior and with the usual convention that 
log(0) = 0, we do not have to worry about unbounded values 
afS(p||r). 

Problem 3 (Minimum relative entropy estimation): Given 
the observables Z\ . . . Z p and the corresponding estimates 
fx... f p , solve 

minimize §(p||r) subject to tr(pZj) = fi, i = 1 . . .p. (4) 

p>0 

Here, r represents the a priori information on the considered 
quantum system. We set a = I if no information is available. 
In this situation, (O is the opposite of the quantum entropy of 
p, and we obtain a MAXENT problem. Note that, the solution 
to the Problem above may be singular. 

III. Feasibility Analysis 
A. An auxiliary problem 

We start by addressing the feasibility problem, i.e. to 
determine when S is not empty. In addition, whenever the 
problem is not feasible, we show how to determine a suitable 
perturbation of the {/;} that makes our problem feasible. We 
will show that the corresponding S only contains singular 
density matrices when the constraints are relaxed. 

Note that, the constraints are linear in fi and Zi, and can 
be linearly combined: afi + f3fk = tr[p(aZi + f3Zk)] for 
each a, f3 G ffi \ {0} and i, k = 1 . . .p. Consider the vector 



space generated by the observed operators, span{Zi . . . Z p }. 
Thus, by applying the Gram-Schmidt process, starting with 
X\ = -^Zi = -^1, we can compute an orthonormal basis 
for it: 

i-l 

Xi — aiZi + ^ajXi, i = 2...m<p. (5) 
1=1 

By linearity, by associating these basis elements to the esti- 
mates 




4-1 

fi ■= (4fi + Y^ a ifh i = 2...m, (6) 

1=1 

we obtain a new yet equivalent set of constraints: 

fi = ti(pXi), i = l. ..m. (7) 

Note that, / e span {^"1 . . . X m }. 

Let Yx...Y n 2_ m be an orthonormal completion of 
Xi . . .X m to a basis of H n . Accordingly, all the Hermitian 
matrices, and in particular density operators, can be expressed 
as 

m n 2 — m 

p = J^aiXi+ J2 & Y i> ( 8 > 

i=l i=l 

with tr(pXi) = on. In particular, all the Hermitian matrices 
satisfying the linear constraints in © depend on n 2 — m 
parameters (3 = . . . (3 n 2_ m j: 

n 2 — m 

P = P0+ W 

i=l 

where we have defined the (not necessarily positive) pseudo- 
state associated to the constraints: 

m 

p = J2fi X i- ( 10 > 

i=l 

In the light of this observation, the feasibility problem 
consists in checking if there exists at least one vector j3 £ 
1™ ~ m such that p > 0. To this aim, we introduce an 
auxiliary problem. Intuitively, the idea is the following: given 
any Hermitian matrix po, there always exists a real p such 
that po + pi is positive definite. More precisely, if po is 
not positive definite already, it is easy to see that such a p 
will need to be positive. On the other hand, if po is already 
positive, the perturbed matrix remains positive semi-definite 
for some small, negative p. Studying the minimal p that 
correspond to a positive semidefinite matrix offers us a way 
to understand whether our constraints allow for physically 
admissible solutions. 

Let us formalize these idea: we define c := 
[ ... 1 ] T € R" 2 - m+1 , and 

n 2 —m 

H(v) := Po + ^2 ViYi + v n 2_ m+1 Xi 

i=l 

withu= [ v\ ... v n 2_ m+1 ] and we consider the follow- 
ing minimum eigenvalue problem. 
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Problem 4: Given po as in ( fTOT > and Y\...,Y n i_ m an 
orthonormal completion of span{Xi . . .X m }, solve 

minimize c T v subject to v G Z := {v \ H(v) > 0} . 

(11) 

Lemma 3.1: Problem |4] always admits solution. 

Proof: First of all, notice that Problem |4] is a convex 
optimization problem, and the objective function c T v is linear 
and continuous over the set Z. Then, the proof is divided in 
three steps. 



Step 1: We show that c v 



J n 2 —m+l 



is bounded from below 



on I: since X\, . . . , X m , Y\, . . . , Y n i_ m forms an orthonormal 
basis and / G span {Xi, . . . , X m }, the matrices {Y,} are 
traceless. Thus, 



tr 



H(v)] = tr[p + + u n 2 -m+l^l] 



and tr[H(v)] 



tr(po) + \Znu„2_ OT+1 = 1 + \/reV-m+i 



G Z. Hence, 



T 

c v 



-m+l 



> for each v 

> — 7= for each uel 

— y'n — 

Sfep 2: Let us consider w = 
[ ... \Zn(-X min (p ) + 1) ] G I where A min (p ) 
denotes the minimum eigenvalue of po. Accordingly, 
c T w = v^l - ^min(po) + 1) an d Problem [4] is equivalent to 
minimize c T v over the closed sublevel set Iq = {v\ H(v) > 



0, 



< % 



-i < ^(-A min (p ) + 1)} C Z. We 



oo as 

n 2 — m+l 



want to show that Zo is bounded and accordingly compact 
(recall that we are working in a finite dimensional space). 
This can done by proving that a sequence {v k } k>Q such that 
\\v k \\ -+ oo cannot belong to Zo. It is therefore sufficient to 
show that the minimum eigenvalue of the associated Hermitian 
matrix H(v k ) tends to — oo as \\v k \\ —> oo with v m 2_ n+1 
bounded. Note that the affine map v i-» H(v) is injective, 
since Yi . . . Y n 2_ m , X\ are linearly independent. Accordingly 
j|i/(v fe )|| — >• oo as Hw^ll — > oo. Since H(v k ) is an Hermitian 
matrix, H(v_ k ) has an eigenvalue r/k such that \rjk\ —J 
||v fc j| — > oo. By construction tr[H(v k )} = 1 + \fnv k 
and v k 2 _ m+1 is bounded in Zo- Thus tr[if (w fe )] < oo, namely 
the sum of its eigenvalues is always bounded. Thus, there 
exists an eigenvalue of H(v k ) which approaches — oo as 
k oo. So, Zo is bounded. 

Step 3: Since c T v is continuous over the compact set Zo, by 
Weiestrass' theorem we conclude that c T v admits a minimum 
point over Zo. ■ 

We need to take into account that the vector which mini- 
mizes c T v over Z may not be in general unique. However, to 
our aim we are more interested in the sign of the minimum. 

Proposition 3.1: Let n = mmc T u. Then, the following 

vex 

facts hold: 

1) If p > 0, then Problem Q] is not feasible 

2) If /i < 0, then Problem [T] is feasible and there exists at 
least one positive definite matrix satisfying constraints 
in @ 

3) If p = 0, then Problem [T] is feasible and all the matrices 
satisfying constraints in are singular. 



Proof: Note that G(«i , . . . , v n 2_ rn ) := p + Ya=i ™ 
represents the parametric family of Hermitian matrices (not 
necessary positive semidefinite) satisfying constraints in (0. 
Define e := —v n 2_ m+1 , thus Problem [4] can be rewritten in 
the following way: 

maximize e subject to G(v-y,.. 



i Vn 2 —m 



(12) 



Let e° = —p be the solution of the above problem. If 
e° < 0, the parametric family G{v\,..., v n 2_ m ) does not 
contains positive semidefinite matrices, accordingly Problem 
Q] does not admit solution. If e° > 0, the parametric family 
G(vi, . . . , w„2_ m ) contains at least one positive semidefinite 
matrix and Problem [T] admits solution. Moreover if e° > 0, the 

o 

parametric family contains at least one matrix p > 4=J > 
which is positive definite. On the contrary, for e° = there 
only exist positive semidefinite matrices which are singular. 



An effective numerical approach for the solution to the 
problem is described in Appendix [A] 

In the light of the previous result, if p < Problem [T] is 
feasible and S contains at least one positive definite solution. 
As we will see in Section IIVI this condition ensures that a 
minimum relative entropy criterion will lead to an admissible 
solution in S. The remaining cases need to be studied more 
carefully. We start by showing how to make Problem [T]feasible 
when it is not be so for the given constraints. It turns out that 
a minimally relaxed problem is feasible and S only contains 
singular density matrices. Next, we deal with the case in which 
Problem[T]is feasible and all its solutions are singular, showing 
how they all share a minimal kernel and how to construct 
a reduced problem with a full-rank solution for which the 
minimum relative entropy methods work. 

B. Forcing the feasibility condition (case p > 0) 

The parameter p given by the auxiliary problem described 
above reveals if the original problem is feasible, but also 
suggests an "optimal" way to relax unfeasible constraints so 
that they make Problem [T] feasible. In fact, from the definition 
of p, we know that there exist v\ . . . v n 2 _ m G M such that 



Po 



J2 ViYi + pX x > 0, 



and: 



tr(p (ti ) = 1 + sfnp. 



From this positive operator, in order to obtain a density 
operator, we only need to normalize the trace by defining: 

P := i I Pt* ( 13 ) 



l + V^' 



y/np 



X; 



. l + A 



Vnp 



This implies that the original problem can be made feasible by 
uniformly, "isotropically" contracting the data {/{} of a factor 
1/(1 + \frip) and, in light of the fact that p is a solution to 
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Problem[4] that this is the minimum amount of contraction that 
makes Problem [TJ feasible. Moreover, the corresponding set S 
only contains singular solutions. 

However, the entries in the data set {fa} may differ in 
their reliability, and one would like to be able to relax the 
corresponding constraints accordingly. This is complicated by 
the fact that the original {Zi} may not be orthogonal, and the 
data we are contracting are in fact the linearly transformed 
output of the Gram-Schmidt orthonormalization described 
above. 

This weighed relaxation can be realized as follows: consider 
the initial setting of Section |IIJ where we have p observables 
Z\ . . . Z p (not necessarily orthonormal), with Z\ = I and 
fi = 1. Define the reliability indexes < d 2 . . . d p < 1 
associated to each observable Z 2 . . . Z p . More precisely, the 
more fa is reliable, the closer to one is. This information 
can be extracted, for example, from an error analysis on the 
measurement procedures, with di associated to the normalized 
reciprocal of the variances. 

When we obtain the orthonormal generators X\ . . . X m , the 
Gram-Schmidt process induces a linear transformation on the 
original estimates fi,.. - ,f p : 



(14) 



" h ' 




' h ' 




= T 




fm 




. f P . 



where T = 







l 

Ti T 2 



pmxp 



In order to modify the data {fi} according to their reliability 
indexes, we define the new set of data: 



fi 



f' 



= T 



kd 2 f 2 
kdpfp 



where k > maxjd^ 1 }. In this way f 2l ■ ■ ■ , f p are amplified 
of a factor kdi > 1 according their reliability indexes. This 
will allow for the maximum contraction to be applied to the 
most noisy estimates. 

In order to compute the minimum p, that makes the original 
problem feasible perturbing the data consistently with their 
reliability indexes, we can solve Problem |4] with respect to 
the new pseudo-state: 



Po 



i=l 



fcXi- 



(16) 



It is easy to see that if the original problem was unfeasible, 
this modified problem is unfeasible as well for k large enough, 
since all the fa corresponding to traceless operators are mul- 
tiplied for a factor ifd, >> 1. Let p! > be the parameter 
given by the auxiliary problem when p' Q is considered. By the 
results of the previous subsection and ([Tjt above, we consider 



the perturbed constraints 

A = \ 

fi = 



1 



-f 



i + w- — - (17) 

Thus, the corresponding Problem Q] is feasible and S only 
contains singular solutions. 

C. Case fi = 

In the limit case (1 = 0, not only all solutions are singular, 
but they share a key property. 

Proposition 3.2: Assume that, with the definition above, 
p, = 0. Then there exists a kernel JC which is common for 
all peS. 

Proof: Let us assume /i = 0, accordingly Problem 1 
does only admit singular solutions, with dimkcr(p) > 
V p G S. Pick a solution p° G S with kernel of min- 
imal dimension. Suppose by contradiction that there exists 
p G S such that ker(p°) G\ ker(p). Taking into account 
p G (0, 1), we define p := pp° + (1 — p)p G S. Accordingly 
dimkcr(p) < dimkcr^ ) which is a contradiction, since p° 
has kernel with minimal dimension on S. We conclude that 
K = kcr(p°) C ker(» V p G S. ■ 

This directly implies the following block-form for all the 
solutions to Problem 1. 

Corollary 3.1: Let p° G S be a solution with minimal 
kernel JC and consider its block-diagonal form 



p°=U 



Pl 




where U is a unitary change of basis consistent with the 
Hilbert space decomposition % = K, 1 - © JC so that p\ > 0. 
Then, the set of all the solutions of Problem Q] is 



(15) S = {p = U 



Pi 




I pi > 0, fa = ix{pXi) \ . (18) 



As consequence of Corollary 13. II we can focus on a reduced 
version of Problem Q] by considering optimization only on 
the support of p° , for which the minimum relative entropy is 
applicable since p\ is positive definite, see Section [TV] 

IV. State Estimation with Minimum Relative 
Entropy Criterion 

A. Reduced Problem 

In the previous part of the paper we showed how to check 
the feasibility of Problem [TJ given the constraints associated 
to the data and, if needed, how to relax the constraints in 
such a way that the corresponding Problem [TJ is feasible. In 
general, however, the set of solutions S is not constituted by 
only one element. In this section, we show how to choose, and 
then compute, a solution in S according the minimum quantum 
relative entropy criterion. 

Given the results of the previous sections, we can assume 
that either Problem Q] admits at least one (strictly) positive 
definite solution, or we can resort to a reduced problem for 
which a full rank solution exists. In fact, if S only contains 
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singular matrices (case p, = 0, or after relaxation of the 
constraints), by Corollary 13.11 we have that the set of solution 

is 



S 



u 



Pi o 




& I Pi > 0, fi = ti(pXi) \ (19) 



for some unitary change of basis U consistent with the Hilbert 
space partition H = JC 1 - © K,. Accordingly for each p G S, 
constraints in (|7]i can be rewritten in the following way 



fi = tr(pXi) =tr[U 



Pi 




rtX t = tr( Pl Xi) (20) 



where X l := [ I ] WXiU G H ni with m < n. 

Accordingly Problem [TJ is equivalent to the corresponding 
reduced problem with X\ ■ ■ ■ X m and fi ■ ■ ■ f m - The corre- 
sponding set of solutions is 



Si = {pi e H ni | pi > 0, ft = ti(pxXi)} 



(21) 



which contains the positive definite solution p°. Once chosen 
a solution pi E Si, the corresponding original solution is 



U 



Pi o 




(22) 



In the effort of keeping a simple notation, we will not 
distinguish between the reduced and the full problem in the 
following discussion, therefore using p for either the full or the 
reduced state, {X{\ for either the full or reduced observable, 
and n for the dimension of the full Hilbert space or the reduced 
one as needed. We can consider the following simpler problem, 
restricted to strictly positive matrices: 

Problem 5: Given the observables Xi . . . X m and the cor- 
responding estimates f\ . . . f m , solve 

minimize Sfpllr) subject to tr(pXi) = fi, i = 1 . . .m. 

(23) 

B. Lagrangian and Form of the Full-Rank Solution 

Now we are ready to derive a solution method for problem 
the entropic criterion. Consider the linear operator associated 
to the above constraints: 



L : U„ -> 



P^ 



ti( P Xi 



tr(pX m ) 



(24) 



Given A = [ \ x ... X m } T G W n and p G H„, 

m m 

(L(p),X) =J2\itv(pXi) =tv(pJ2^iXi) = (P,L*(\)) 

i=l i=l 

(25) 



where 



L* : K m -> H n 

m 

Ah>5] XiX, 



(26) 



is the adjoint operator of L. Define f = [ fi ■ ■ ■ f m 1 ■ 
Since Problem[5]is a constrained convex optimization problem, 
we take into account its Lagrangian 

£(p,X) = tr(plog/o- plogr) - (A,/ - L{p)) 

= tr(plogp- plogr) + (L*(X),p) - (A,/) 
- tr[p(logp-logr + i*(A))]-(A,/) (27) 

where A G W l is the Lagrange multiplier. Note that C(-,X) 
is strictly convex over T-L n ,+ where "H n ,+ denotes the cone 
of the positive definite matrices. Thus, its minimum point is 
given by annihilating its first variation 

SC( P ,X;Sp) = tr[(logp + 7-logr + L*(A))M(28) 

for each direction dp G 7i n . Accordingly, the unique minimum 
point for £(-, A) is 



_ Aogr-I-L'(X) 



and 



p(X) = c 



C(p(X),X) <L(p,X), Vp£U n ,+. 



(29) 



(30) 



If there exists A° such that p(X°) G S, i.e. / = L{p(X )), 
then (l30l l implies 



§(p(A )||r)<§(p||r), VpeS. 



(31) 



Thus, if we are able to find A° G M m such that f-L(p(X°))=0, 
then p(X°) is the unique solution to Problem |5] This issue is 
solved by considering the dual problem wherein A° (if there 
exists) maximizes the following functional over R m 



inf £(p,A) 



= C( P (X),X) 

= -tr(e logr - / - i *( A )) - (A,/). (32) 



The existence of such a A is proved in Section IIV-CI 
Moreover, we suggest how to efficiently compute it. 

Remark 4.1: When p = 0, S only contains singular matri- 
ces. Instead of considering the reduced problem as we did, one 
could try to consider Problem [3] with relaxed constraint p > 0. 
In this situation the Slater's condition BTI 5.2.3], however, 
does not hold because S does not contain positive definite 
matrices. Hence, we cannot conclude that p(A°) is the desired 
solution of the primal problem. Moreover, for any A, would 
be p(A°) > 0. This means that p(A°) is not the solution to 
Problem |3] 

C. Dual Problem: Existence and Uniqueness of the Solution 

The dual problem consists in maximizing ( f32t over R m 
which is equivalent to minimize 

J(A) = tr(c logT - J - L * (A) ) + (A, f) . (33) 

This functional will be referred to as dual function throughout 
this Section. Before to prove the existence of A° which 
minimizes J we need to introduce the following technical 
results. 

First of all, note that (A- 1 ,/) = for each X 1 - G 
[Range L]- 1 . In fact, if A G [Range L} 1 - = kcrL*, then 
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L*(A X ) = 0. Since S n H n ,+ ^ 0, there exists p f G W„,+ 
such that / = L(pf). Thus, 

(A\/) = (A\L( P/ )) = <Z*(A x ) )/9/ ) = tr(L*(A x ) P/ ) = 

(34) 

We conclude that A x does not affect J, i.e. 

J(A + A x ) = J(A), V X 1 - G [Range L]" 1 . (35) 

We may therefore restrict the search of the minimum point for 
J over Range L. 
Proposition 4.1: J is strictly convex over Range L. 

Proof: Since J is the opposite of C(p(X), A), it is convex 
over R m . The first and the second variation of J(A) in 
direction <5A € K m are: 

SJ(X;6X) = -tr / (c (1 - t)(losT - / - L * (A)) L*((SA) (36) 
Jo 

. e *( lo fi-- r - i *W))(it + (<5A ) /) 

= -tr / c logT - / - L '* (A) diL*(5A) + ((5A,/) 
Jo 

= -tr(e logr - / - i *( A )i*( ( 5A)) + ((5A,/) (37) 

S 2 J(X;SX) = tr[f e (1 - t)(logT - / - L * (A)) L*( ( 5A) 
Jo 

• e t(logT - 7 - L * (A)) L*((5A)di]. (38) 

Here, we exploited the expression for the differential of the 
matrix exponential (see lfT9"l Appendix IA]). Define Q t = 
e t(io g T-/-L*(A)) w jji c h i s positive definite for each tel. 
Thus, 

6 2 J(X;6X) = f tr(Q 1 ^ t L*(5X)Q t L*(6X))dt (39) 
Jo 

= / tr(Q^L*(SX)Q 1 ^ t L*{SX)Q^)dt>0. 
Jo 

Assume now that SX E Range L. If S 2 J(X;SX) = 0, then 

tr(Q* L*(6\)Qi-tL*(5\)Q$ ) = 0. Since Q t > for each 
t e t, it follows that L*(SX) = 0. Since SX G Range L, 
we get <5A = 0. We conclude that S 2 J(X;SX) > 0, for each 
<5A 7^ 0, i.e. the statement holds. ■ 

In the light of the previous result, the dual problem admits at 
most one solution, say A°, over Range L. If such a A° does 
exist, then S J(A; <5A) = V S X G Range L which is equivalent 
to -L(c logT ~ / - i *( A ° ) ) + f = 0. It means that p(X°) satisfies 
constraints in © and it is therefore the unique solution to 
Problem [5] It remains to show that such a A° does exist. 
Proposition 4.2: J admits a minimum point over Range L. 
Proof: We have to show that the continuous function J 
takes minimum value over Range L. Observing that J(0) = 
i tr(r), we can restrict our search over the closed set 

M := {A G R m | J(A) < J(0)} n Range L. 

We shall show that A4 is bounded. To this aim, consider a 
sequence {Ai} igN , A^ G RangeL such that ||Aj|| — > oo. It is 
therefore sufficient to show that J(Aj) —> oo, as i oo. First 
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of all note that the minimum singular value a of L* restricted 
to Range L = [ker L*} 1 - is strictly positive, accordingly 

||L*(Ai)|| > a||Ai|| -> oo. (40) 

This means that L*(Xi), which is an Hermitian matrix, has 
at least one eigenvalue such that \f3i\ approach infinity. 
If /3i —> — oo, then the first term of J tends to infinity and 
dominates the second one, accordingly J(Aj) —> oo. In the 
remaining possible case no eigenvalue of L(A;) approaches 
— oo and /3j — > oo. Thus, L*(Xi) > MI where M G R is 
a finite constant and the first term of J takes a finite value. 
Since S n "Hn,+ is not empty, there exists p/ G such 
that / = L(pf) and 

(XiJ) - (i*(Ai),/>/) =tr(p)L*(Ai)4) > M, (41) 

where we exploited the fact that tv(pf) = 1. This means that 

(Xi,f) cannot approach — oo. Finally, \\pj- L*(Xi)pJ\\ — > oo, 

because p$ > 0. It follows that pj L*(Xi)p^ , and hence 
L*(Aj), have at least one eigenvalue tending to oo. Accord- 
ingly J(Aj) — > oo as i — > oo. We conclude that is bounded 
and accordingly compact. By Weierstrass' theorem, J admits 
minimum point over Ai. M 

Finally, A may be computed by e.g. employing a Newton 
algorithm with backtracking, see Section VI in [32), which 
globally converges. 

V. Conclusions 

The proposed set of analytic results and algorithms provides 
a general method to find the exact minimum relative entropy 
estimate of a quantum state under general assumptions, that 
is, without requiring the constraints to include a full-rank 
solution to begin with. A general solution to the problem was 
missing for quantum estimation, and our feasibility analysis 
is of interest for the classical case as well. Summarizing, we 
have proposed: (i) A numerical method to decide the feasibility 
of Problem Q] which is of interest on his own, and compute 
the minimum necessary relaxation of the constraints whenever 
necessary to obtain at least one solution; (ii) As a side product, 
we are able to determine whether there is at least a full-rank 
solution. When this is not the case, we proved there exists, 
and devised a way to determine, the maximal common kernel 
of all the solution; (iii) We extended Georgiou's approach to 
maximum entropy estimation to our setting, and analyzed in 
depth both the primal and the dual optimization problems. The 
general form of a full-rank solution of the reduced problem, 
namely the one we obtain by removing the common kernel, 
is given explicitly and depends on the solution of the dual 
problem. The latter is proven to be a convex optimization 
problem with a unique solution, A°, which can be obtained 
by standard numerical algorithms. 

It is also possible to show, A° is continuous with respect to 
the data set / (see Appendix IB1. Since, in view of ( 1291 , p(X°) 
is continuous with respect to A , we can conclude that the 
solution to Problem is continuous with respect to the data 
fi ■ ■ ■ fm- This is of course a desirable property, and ensures 
that for a increasing accuracy of the estimates, the computed 
solution will converge to the actual state. 
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Possible extensions of the present framework include ap- 
plications to state reconstruction from local marginals and 
its connection to entanglement generation fl27l , l28l , as well 
as comparison with the existing approximate results lETI in 
physically meaningful situation. Lastly, the advantage offered 
by the introduction of an a priori state in the estimation 
problem can be exploited to devise recursive algorithms, that 
update existing estimate in an optimal way relying only on 
partial data. 

Appendix 

A. Numerical solution for the feasibility problem 

We propose a Newton-type algorithm with logarithmic 
barrier for numerically find one solution / to Problem [4] 
Using the same approach of [9, Section 4], we resort to the 
approximate problem 



mm g q (v 

£Sint(I) 

where q > 0, and 

g q (v) := qc T v — logdet(_ff(w)). 
Recall that we defined 



(42) 



(43) 



T={v\ H(v) > 0}, 

and notice that g q is continuous and strictly convex over 
the set int(I). Moreover lim^ax 9q{u) = +°° an d the 
component v n 2_ m+1 can be restricted to belong to a closed 
and bounded interval. Accordingly the approximated problem 
admits a unique solution, denoted by v q , which is numerically 
computed by employing the Newton algorithm with back- 
tracking, [31 9.5]. Here, we can choose as initial guess the 
vector v% = [ ... y/n.(-\ min (p Q ) + 1 ] e int(X). 



Concerning the Z-th Newton step, we have to solve 



AS? 



■H., VGjss 



(44) 



where 



VG*. = 



BLf, 



qc 



tr^f)- 1 ^) 

tr(i/(^)- 1 r n2 _ m 

trfJJCfi?)- 1 *!) 



(45) 



(46) 



are the gradient and the Hessian computed at vS, respectively. 
Note that, it is not difficult to prove that 

1) The sequence {v q }i>o generated by the 
algorithm is contained in the compact set 
Af = ju G int(I) | - < c T v < c T v%y, 

2) g q is twice differentiable and strongly convex on J\f; 

3) The Hessian of g q is Lipschitz continuous on M. 



Accordingly, the Newton algorithm with backtracking globally 
converges, ||3T1 9.5.3]. Moreover, the rate of convergence is 
quadratic in the last stage. Finally, note that the found solution 



satisfies the following inequalities, OTl p. 566], 



c T v° 



< c T v 



i < c T v° 



(47) 



where v° is a solution to Problem [4] and ^ the accuracy of 
c T v q with respect to the optimal value c T v°. 

This method works well only setting a moderate accuracy. 
To improve the accuracy, we can iterate the above Newton 
algorithm and in each iteration we gradually increase q in 
order to find a solution with a specified accuracy £ > 
El p. 569]: 

1) Set the initial conditions q > and v qa = 
[0 ... Vi„(po) + 1) ] T G int(2). 

2) At the fc-th iteration compute v gk £ int(X) by minimiz- 
ing g qk with starting point ifl k - 1 by using the Newton 
method previously presented. 

3) Set qk+i = flqk with j3 > 1 closer to one. 

4) Repeat steps 2 and 3 until the condition ^ < £ is 
satisfied. 

Finally, we deal with the problem to compute a solution 
to Problem Q] with kernel JC when /u = 0. In this situation, 
consider the non-empty convex set 



Y, WiYi > 



(48) 



where w = [ w\ ... w n 2_ m ] T <g R"~ m . Thus, the set 
of solutions to Problem Q] is 



S = \Po+ ^2 WiYi | w G T* 



(49) 



We wish to compute a matrix p° G S having kernel cor- 
responding to the minimum common kernel K,. To this aim 
consider the following problem. 

Problem 6: Pick u G K" ~ m at random and solve: 



w u = arg min (w — u) {w — u). 



(50) 



Note that: 

• I* is compact (i.e. closed and bounded) 

• (m~ u) T (w~ u) is continuous and strictly convex on Z». 
By Weiestrass' theorem, the above problem admits a unique 
solution w li . 

Let us define p°' u := p Q + EfcT™ < Y * € S and p u := 
p~o + X)fc=i in u i^i € S. It is easy to see that Problem [6] can 
be rewritten in terms of the matrix p° :U as follows: 



arg mm 1 1 p 

pes 



p\\f, 



(51) 



where || • \\p denotes the Frobenius matrix norm. Thus, p°' u is 
the closest matrix in S (with respect to the Frobenius norm) 
to the matrix p u belonging to the hyperplane characterized 
by the constraints (0. Hence, randomly generating a finite 
sequence {u^ ... U;} of elements in M™ ~ m we obtain a subset 
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U := {p°- Ul . . .p°' Ul } contained in S. Construct the convex 
combination p := j Y^j=i P°' Uj ■ Then p has minimal kernel 
if either one of the p°' Uj belongs to the interior of S, or p°' Uj 
belong to different faces of S, which is a compact convex set. 
By the randomized construction, the probability of remaining 
on the boundary of S becomes small as I grows. 

Concerning the computation of w u , we can resort a Newton- 
type algorithm with logarithmic barrier named exterior-point 
method, (33] Chapter 4]. 

B. Continuity of A° with respect to f 

We show that the solution A° is continuous with respect to 
the data set /. To this aim we take into account the following 
result, see l34l Theorem 3.1]. 

Theorem A.l: Let A be an open and convex subset of a 
finite-dimensional euclidean space V. Let h : A — > R be a 
strictly convex function, and suppose that a minimum point x 
of h exists. Then, for all e > 0, there exists 5 > such that, 
for p e M", ||p|| < 5, the function h p : A->R defined as 

h p (x) := h(x) - (p,x) (52) 

admits a unique minimum point x p , and moreover 

||x p -x||<£. (53) 

Consider 

J(\J) = tr(J°*T- I - L 'M) + (A, f) (54) 

where we make the dependence of J upon /. Then, the unique 
minimum point is 

A(/) = arg min J(A,/). (55) 

A(EJR 771 

Let 5f_ G R m be a perturbation of /. We have J(A, f + 5f) = 
J(A, /) + X T 5f. Applying the previous theorem, where Sf 
is -p, we have: V £ > 3 i > s.t. if ||<5/|| < 5 then 
J(A, / + Sf) admits a unique minimum point 

A(/ + Sf) = arg min J(A, / + Sf) (56) 

and 

||A(/ + £/)-A(/)|| < e . (57) 
Accordingly, the map / H> A(/) is continuous. 
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